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Abstract 

This  paper  is  concerned  with  the  task  of  visual  motion-based  navigation.  A 
critical  requirement  of  the  task  is  the  ability  to  estimate  3-D  depth  and 
motion  from  visual  information.  Recent  studies  have  demonstrated  that  the 
relevant  cues  is  contained  in  motion  parallax  or  optical  flow  and  that  flow 
field  divergence  and  hence  time-to-contact  can  be  extracted.  We  present  a 
new  concept  called  image  gradient  evolution  (IGE),  which  utilizes  the 
change  of  image  spatial  gradients  over  time  as  a threat  cue:  an 
approaching  object  induces  2-D  expanding  motion  and  causes  the  image 
spatial  structure  to  stretch  so  the  image  gradients  decrease.  Based  on  this 
idea,  our  method  offers  a one-step  solution  directly  from  image  gradients, 
instead  of  from  optical  flow  and  its  derived  properties.  We  use  a technique 
that  is  local  and  linear  so  the  implementation  can  be  very  fast.  The  threat 
map  is  expectedly  noisy  but  sufficiently  informative,  as  is  seen  in 
demonstrations  on  several  real  images.  These  two  aspects,  fast 
implementation  and  useful  qualitative  information,  provide  a viable 
solution  to  navigational  tasks. 

1.  Motivation 

We  start  with  a simple  illustration  to  introduce  the  idea  of  the  image  gradient  evolution 
(IGE)  and  emphasize  its  difference  from  the  optical  flow  approach. 

Figure  1 . 1 shows  a diverging  object  in  the  image.  Our  goal  is  to  obtain  an  algorithm  that 
can  warn  the  observer  of  a potential  collision. 
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Conventionally,  optical  flow  (Figure  1.2)  is  computed  first  and  then  the  first  order  flow 
field  properties  (diverging  or  converging)  is  used  to  characterize  the  underlying  objects’  3- 
D motion.  In  our  approach,  however,  the  change  of  the  image  gradients  over  time  is  com- 
puted. As  illustrated  in  Figure  2,  a decreasing  image  intensity  gradient(slope)  at  an  image 

Original  image 
intensity  profile 

Image  intensity  profile 
after  diverging 

Fig  2 1-D  image  gradient  evolution 


Decreasing 
slope 


point  over  time  signifies  a diverging  object.  As  shown  in  Figure  3,  we  wish  to  avoid  pro- 
cessing the  noisy  flow  data  when  it  is  evident  that  we  can  achieve  the  same  result  from  IGE. 

The  following  sections  are  organized  as  follows.  Section  2 discusses  our  work  in  the 
context  of  previous  research.  The  IGE  is  formulated  in  the  context  of  a generalized  motion 
model  in  Section  3.  Section  4 details  our  filtering  scheme  and  implementation  issues.  Sec- 
tion 5 demonstrates  experimental  results  on  real  images.  Section  6 concludes  the  paper. 
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Fig  3 Different  approaches  to  divergence 


2.  Previous  Work 

The  looming  effect  is  a major  cue  in  biological  vision  systems  used  to  sense  danger 
(Schiff,  et  al.[23] ).  Local  motion  parallax  is  in  turn  a cue  for  looming  or  divergence 
(Werkhoven  & Koenderink[27]  ).  Since  time-to-contact  is  related  to  both  divergence  and 
3-D  scene  structure,  solving  for  time-to-contact  is,  in  an  exact  sense,  equivalent  to  recov- 
ering 3-D  motion  and  structures.  There  is  a plethora  of  literature  dealing  with  recovering  3- 
D motion  and  structure  from  optical  flow  (Adiv[l]  , Bruss  & Hom[5]  , Negahdaripour  & 
Lee[20] ),  image  sequences  (Broida  & Chellappa[4]  ),  or  features  (Negahdaripour  & 
Horn[19]  , Tsai  & Huang[26] ).  It  is  basically  an  ill-posed  and  nonlinear  problem.  These 
methods  usually  use  iterative  optimization  techniques  which  are  often  time-consuming. 
However,  these  reports  have  established  the  feasibility  of  imposing  extra  constraints  and / 
or  using  derivatives  of  flow  to  solve  the  problem  in  theory.  The  intended  precision  of  these 
quantitative  methods  is  often  an  illusion  due  to  the  limit  on  accuracy  with  which  the  input 
measures  can  be  obtained  (Thompson  & Keamey[25]  ).  Accurate  optical  flow  estimation 
is  still  a difficult  problem. 

The  avoidance  of  differentiating  optical  flow  has  been  approached  in  different  ways. 
Integration  theorems  such  as  Green’s  theorem  (Poggio,  et  al.[22]  , Cipolla,  et  al.[9]  ), 
Stoke’s  theorem  [22] , and  Gauss’s  theorem  (Gupta[ll] ) are  used  to  estimate  first  order 
flow  parameters  directly  from  image  intensity  integrals[l  1]  , image  moments  (and  their 
temporal  derivatives)  [9]  , or  flow  integrals[22]  . The  integration  techniques  basically  trade 
off  noise  sensitivity  for  smoothness,  which  arises  from  a single  motion  assumption  within 
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the  integration  contour.  To  prevent  the  integration  contour  from  going  across  boundaries, 
additional  mechanisms,  probably  global,  are  required  to  segment  images. 

Another  approach  models  the  local  motion  with  an  affine  model.  It  uses  higher  order 
pointwise  image  derivatives  (Nagel[18] , Werkhoven  & Koenderink[27] ) or  patchwise 
motion  coherence  (Campani  & Verri  [7] , Bergen,  et  al.[3] ) techniques  to  solve  for  first  or- 
der motion  parameters.  This  approach  is  actually  aimed  at  accurate  flow  estimation  and  the 
reports  make  little  mention  of  3-D  motion  estimation. 

Since  the  above  two  approaches  do  not  model  the  3-D  structure,  they  do  not  offer  suf- 
ficient information  to  solve  for  time-to-contact  without  additional  constraints  or  differenti- 
ations in  the  general  case.  It  has  been  proved  that  only  the  upper  and  lower  bounds  on  time- 
to-contact  can  be  derived  (Subbarao[24]  )(Cipolla,  et  al.[9]  ).  Meyer  and  Bouthemy[14] 
used  temporal  derivatives  on  the  first  order  parameters  to  circumvent  the  problem.  Essen- 
tially it  is  equivalent  to  second  order  derivatives,  but  Kalman  filtering  on  the  temporal  de- 
rivatives makes  the  result  much  smoother  and  more  practical.  However,  a fast  implemen- 
tation is  relatively  difficult. 

Nelson  & Aloimonos[21]  were  the  first  to  attempt  a realistic  approach  to  navigation. 
Their  algorithm  computes  directional  divergence,  which  is  a second  order  flow  parameter, 
and  can  be  very  noisy.  Coombs,  et  al.[10]  employed  flow  divergence  for  real-time  obstacle 
avoidance.  Their  obstacle  avoidance  system  currently  appears  to  be  the  fastest  one  using 
flow  divergence.  In  their  system,  the  time-to-contact  is  equivalent  to  divergence  when  care- 
fully controlling  the  camera  so  as  to  approximately  translate  in  the  direction  of  the  optical 
axis.  The  computation  of  divergence,  however,  is  based  on  noisy  flow  and  requires  tempo- 
ral integration  in  order  to  interpret  the  result.  Camus[8]  implemented  a real-time  algorithm 
for  time-to-contact  which  is  quite  reliable.  However,  the  application  is  limited  by  its  restric- 
tive assumptions  about  motion  and  surfaces.  Kundur  & Raviv[14]  proposed  the  use  of  an 
image  quality  measure  for  the  visual  threat  cue.  This  method  exploits  the  camera  defocus- 
ing  effect  for  navigation. 
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The  major  contribution  of  this  paper  is  to  pioneer  the  idea  of  IGE  and  its  use  as  a cue 
for  threat  during  navigation.  Such  a capability  is  embedded  in  a generalized  2-D  motion 
equation  that  also  models  expansion.  An  integrated  spatio-temporal  filtering  scheme  is  de- 
signed to  estimate  image  derivatives  in  a numerically  coherent  manner.  Using  these  deriv- 
atives, IGE  and  optical  flow  can  quickly  be  estimated  at  the  same  time  in  a fast  manner. 


3.  Generalized  Motion  Model 


The  brightness  constancy  equation  is  often  interpreted  in  the  following  way: 

V/  = 0 =>  I(x,  y,  t)  = F(x-ut,  y-vt) . (1) 

The  first  step  in  measuring  IGE  is  to  extend  the  image  motion  model  from  simple  2-D 

translation  to  translation  plus  expansion.  A 3-D  point  at  position  P = ( X , Y,  Z ) , under 

perspective  projection,  projects  to  a point  in  the  2-D  image  plane,  (x,  y) , 


x = fX/Z 


, where  f is  the  focal  length  of  the  projection. 


(2) 


y = fY/z 

Let  there  be  relative  3-D  translational  motion  P{t ) = (X  + Uxt,  Y + UYt,  Z - Uzt) . 
Hence, 


x(t)  = f (X  + U Xt)/(Z  - Uzt) 
y(t)  = f(Y+uyt)/(Z-uzt )' 
Brightness  constancy  and  (3)  yield 


I(x,  y,  t)=  F\x\l-~Yt  I- 


fUx 

Z 


(4) 


3-D  translation  only  is  assumed  for  its  simplicity  since  3-D  rotation  has  no  bearing  on 
expansion  (Koenderink[13]  ).  A general  motion  model  that  includes  rotation  parameters 
can  be  found  in  (Liu,  et  al.[16]  ).  The  following  equations  can  also  be  derived  from  the  gen- 
eral motion  model. 


To  understand  the  generalized  motion  equation  better,  we  describe  equation  (4)  in  the 
context  of  optical  flow. 
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(5) 


(«,v)  = (tm  = 


fu> 


dt’dt 


Uzx 


fU, 


Z-Uzt  Z-Uzt  Z-Uzt 


ffUx  , Uzx  fUy  Uzy 

l z + z ’ z + z 


uzy  \ 

Z-Uzt)~ 


uz 

Let  — be  denoted  by  s , and 


(fUx  fUy 
{ Z ’ z 


by  (p,  q). 


Rewriting  (4)  and  (5): 

/ (x,  y,  t)  = F(x(l  - st)  - pt,  y(l  - st ) - qt)  (6) 

(w,  v)  = (p  + sx,  q + sy).  (7) 

Equation  (7)  lays  out  the  two  components  of  optical  flow:  (p,q),  and  (sx,  sy) . From 
the  linear  dependency  of  (sx,  sy)  on  (x,  y) , s is  interpreted  as  expansion.  The  translation 
component  (p,  q)  is  induced  by  (Ux,  UY)  only,  and  the  expansion  component  5 by  Uz 
only.  When  s = 0 , then  (u,  v)  = (p,  q)  and  (6)  reduces  to  (1). 


The  IGE  is  characterized  by  the  following  equation  (derived  in  Appendix  A), 


Fl(x,y,t)  = (l-st)^-I(x\y',  0), 
OX  ox 


X (l-st)-pt 
y(l-st)-qt 


(8) 


An  image  point  (x',  /)  at  time  0 moves  to  (x,  y)  at  time  t . The  two  image  gradients 
are  related  by  (8).  When  5 is  positive,  which  means  the  object  is  approaching  ( Uz  > 0 , in 


(4)),  the  slope  is  decreasing  (1  - st  < 1 ).  This  coincides  with  our  previous  observation  in 
Fig  2.  In  the  extreme  case,  when  st  = 1 , 


t = - = 77-  and  (x,  y,  t)  = 0. 
s Uz  ox 


(9) 


In  equation  (9),  t is  interpreted  as  the  time-of-contact.  At  that  instant,  the  entire  image 


texture  disappears,  which  is  what  we  observe  when  an  object  is  too  close.  It  is  clear  that  s 
measures  not  only  2-D  expansion,  but  also  IGE  and  time-to-contact.  We  can  use  it  as  a cue 
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for  visual  threat. 


Note  that  the  focus  of  expansion  (FOE)  is  predefined  to  be  at  ( 0 , 0 ) in  (4).  To  complete 
the  formulation,  we  modify  (6)  and  (7)  to  allow  the  FOE  to  be  at  an  arbitrary  location 

(*o>  yo)  • 

I (x,  y,  t)  = (10) 

F((x  - x0)(l  - st)-pt  +x0,(y-  y0)(l  - st)-qt  + y0) 

(u,v)  = (p  + s(x-x0),q  + s(y-y0)).  (11) 

Note  that  equation  (8)  is  derived  based  on  the  assumption  of  a parallel  frontal  surface, 
i.e.,  the  surface  normal  is  parallel  to  the  optical  axis.  When  the  surface  is  not  parallel  fron- 
tal, the  IGE  cannot  be  reliably  interpreted  for  3-D  motion.  However,  in  order  to  use  IGE  as 
a qualitative  cue  for  threat,  our  algorithm  identifies  other  types  of  surfaces  and  potential 
boundaries  as  outliers.  A post-smoothing  stage  then  overwhelms  the  errors  induced  by  the 
outliers.  This  technique  is  reasonable  because  “divergence  due  to  a relatively  distant  object 
can  be  large,  but  only  over  a short  distance  in  the  image”  (Nelson  & Aloimonos[21]  ). 

Compared  with  other  approaches  that  attempt  to  compute  absolute  time-to-contact 
(Burlina  & Chellappa  [6] ),  our  method  appears  inexact.  In  fact,  we  use  a simplified  model 
of  the  affine  motion  with  reasonable  assumptions.  It  is,  however,  a sound  approach  since 
humans  navigate  in  complex  environments  without  estimating  time-to-contact  exactly.  In 
addition,  it  is  important  to  be  able  to  navigate  in  an  unknown  environment  where  the  scene 
and  objects  change  dynamically,  therefore  there  is  little  point  in  spending  time  computing 
exact  time-to-contact  for  the  underlying  scene  [25]  . For  navigational  tasks,  it  is  more  im- 
portant to  build  a “qualitative"  threat  map  quickly  rather  than  to  build  an  accurate  time-to- 
contact  map  slowly. 

4.  Algorithm  and  Implementation 

To  facilitate  the  estimation  of  IGE  in  the  framework  of  the  general  motion  model,  a po- 
tent and  stable  image  differentiation  filtering  scheme  is  needed.  The  set  of  orthogonal  3-D 
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Hermite  polynomial  filters  is  excellent  for  the  task. 


4.1  Hermite  polynomials 

The  nth  Hermite  polynomial  Hn(x)  is  a solution  of 


d H„  dH„ 

— - -2 x—  + 2nH„  = 0. 


dx 


dx 


(12) 


The  Hn(x)  are  derived  by  Rodrigues’  formula  [12] 


H„(x)  = (-lfex2J-e-* 
dx 


(13) 


By  substituting  G(x)  (with  variance  o ) for  e in  (13),  we  generalize  to  Hermite 
polynomials  with  respect  to  the  Gaussian  function.  Let  these  Hermite  polynomials  be 
denoted  by  Hn(x).  Then 


Hn(x)  = I -L-  f H 1 x 


~l/2  ) n\~l/2 

2 ay  v2  a 


(14) 


The  scalar  product  of  two  functions  and  the  L2-norm  of  a function  with  G(x)  as  a 
weight  function  are  defined  as: 


(a,  b)  = J G(x)a(x)b(x)dx  and  ||a||  = (a,  a) 


1/2 


(15) 


The  orthogonality  of  {Hn(x) } can  be  expressed  in  the  following  way [12]  : 

= <T2V5„  (16) 

The  3D  case  of  Hermite  polynomials  is  especially  simple  because  they  are  separable: 

Hljk(x,  y,  t ) = Hi(x)  • Hj(y)  ■ Hk(t ) (17) 

4.2  Derivation  of  gradient  constraint  equations 

We  use  the  following  Gaussian  derivative  theorem  to  derive  motion  constraint  equa- 
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tions. 


Theorem  1:  A one  dimensional  signal  I(x ) can  be  expanded  in  terms  of  Hermite  poly- 
nomials as 


I(x) 


I h 


k = 0 


Hk(x) 

II  All 2 


Then  Ik  = (/,  Hk)  = </'*’,  //„) , where  Ill> 


k 

d I 
dxk 


(18) 


Recall  our  motion  model, 

I(x,y,t ) = (19) 

F((x-x0)(l  -st)- pt+x0,(y- y0)(l  -st)- qt+y0) 

Expand  both  sides  with  Hermite  polynomials, 


oo  oo  oo 


H 


oo  oo  oo 


Hit 


ESI  fy-2  = SEE  ^jjfnh then 

i = Oj  = Ok  = 0 11“  ^11  i = 0 j = Ok  = 0 H“9*ll 

hjk  = 0’Hijk)  = Fijk  = ( F,Hl]k > 

Equating  I tj]  to  and  using  Theorem  1,  we  derive  (see  Appendix  B) 


(20) 


I ijl  ~ -ul (i+  ])j0-vl Kj  + ])0-(i  + j)sl ij0  where  ( u , v)  are  defined  in  (11).  (21) 

Equation  (21)  is  linear  in  terms  of  optical  flow  ( u , v)  and  IGE  5 . Using  00,  01, 10  for 
ij , we  can  derive  three  equations  up  to  the  second  order  and  solve  the  linear  system.  In  our 
implementation,  we  use  six  equations  up  to  the  third  order  to  form  a least  square  formula- 
tion. The  reason  is  that  the  residual  is  an  excellent  reliability  measure.  In  fact,  if  we  consider 
the  residual  as  the  extent  to  which  the  motion  model  is  violated,  it  can  be  used  to  indicate 
noise,  non-frontal  surfaces,  and  boundaries  (Liu,  et  al.[17]  ).  Since  s is  noisy,  our  imple- 
mentation exploits  smoothing  with  confidence  weighting,  i.e.,  extra  steps  of  smoothing  on 
s with  the  reciprocal  of  the  residual  as  weighting.  This  smooths  out  noise  but  prevents 
smoothing  across  boundaries.  It  also  overwhelms  the  errors  due  to  non-frontal  surfaces  as 
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long  as  there  is  a portion  within  the  object  that  is  parallel  frontal  or  nearly  parallel  frontal. 
If  the  objects’  surfaces  are  nowhere  frontal,  we  direct  the  robot  to  look  and  move  in  the 
same  direction  in  a piecewise  manner  using  pan/tilt  camera  control,  thus  eliminating  lateral 
motion  that  may  confuse  the  algorithm. 


5.  Experiments 

The  flow  portion  of  the  algorithm  has  been  shown  to  be  very  accurate  (Liu,  et  al.[16] ) 
compared  with  other  current  algorithms [2]  . The  following  figures  (Fig  4-Fig  6)  show  one 


Fig  4 Approaching  box  and  threat  map 


image  of  the  sequence  and  its  3-D  perspective  threat  map  based  on  IGE  5 . In  the  3-D  threat 
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maps,  elevation  is  used  to  depict  threat.  So  the  elevated  area  represents  closer  objects.  The 
threat  value  is  thresholded  to  enhance  3-D  structure  perception.  For  example,  in  Figure  4, 
the  cereal  box  stands  out  in  the  center  because  most  of  the  threat  values  within  the  box  are 
above  the  threshold,  therefore  a navigational  algorithm  can  then  use  this  threat  map  and 
veer  aside  to  avoid  the  dangerous  area  in  the  center;  in  Figure  5,  the  threat  map  is  gradually 


irregular  clouds 


elevated  from  the  valley  to  the  mountain  in  the  lower  left  comer.  The  sky  is  perceived  as 
mostly  safe  except  at  some  areas  where  the  clouds  change  brightness  irregularly  and  de- 
ceive the  algorithm.  A navigational  algorithm  can  use  this  threat  map  to  avoid  heading  to- 
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wards  the  lower  left  corner;  in  Figure  6,  the  metal  plate  in  the  lower  left  comer  is  extracted; 
the  Coke  can  and  the  platform  are  also  partially  visible;  the  pole  on  the  right  side  is  visible 


Fig  6 NASA  sequence  and  threat  map 


at  both  of  its  ends.  The  fact  that  the  metal  plate  is  detected  and  the  hole  remains  intact  dem- 
onstrates the  effect  of  smoothing  with  confidence  weighting.  We  are  currently  working  on 
a real-time  implementation  of  the  algorithm.  On  64x64  images,  IGE  without  smoothing  can 
be  expected  at  the  rate  of  3-4  frames  per  second  on  a HyperSparc  10  MP  board.  The  amount 
of  smoothing  is  dependent  on  the  image  noise  and  may  take  a little  more  time  than  required 
by  computing  IGE.  Although  the  threat  map  is  noisy  and  currently  its  resolution  is  limited 
by  the  noise  and  irregular  brightness  change,  it  already  provides  useful  information  for  nav- 
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igation. 


6.  Conclusion 

The  use  of  the  IGE  eliminates  the  need  to  process  noisy  optical  flow.  Image  gradient 


evolution  has  been  shown  to  be  a useful  cue  for  threat.  Our  algorithm  builds  a dense  qual- 


itative threat  map  based  on  IGE.  For  navigation,  we  would  rather  compute  useful,  probably 
inexact,  information  quickly  than  exact  information  slowly.  That  is  what  our  algorithm 
achieves. 

Appendix  A. 

We  prove  equation  (8)  here. 


x'  = x(l-st)-pt 
y = y{l-st)-qt  ■ 


(1) 


Assume  the  surface  is  parallel  frontal,  so  = 0 

ox 


(22) 


/ = y(l-st)-qt 


Appendix  B. 

We  prove  equation  (25)(21)  here 


First,  from  (12)  and  (14),  we  establish 
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(2) 


xHn(x)  = o2Hn  + 1(x)  + nHn_1(x). 
Equating  I lj}  to  F l}]  and  using  Theorem  1, 


hi 


Fij,  = (F,  Hm)  = <|f,  Hijo) 


-<e- 


(x-x0)+pyi  My-y0)  + q^i  _ 
1-st  Jdx  l 1-st 


(23) 


Practically,  s « 1 so  l - st  ~ 1.  Equation  (23)  can  be  approximated  by 


-{(s(x-x0)  + p)^-+  (s(y  - y0)  + Hij0)  or -((sx  + u)^-+  (sy  + v)|j,  Hij0)  (24) 


dx  ' 0 dy  J dx 

Using  Theorem  1 again,  we  derive 



- uI(i  + 1)10  - vIi(j  + 7)0  - y0>  - * <^y  ^y0>  • 

Equation  (3)  and  (2)  yield 

~uI(i+  l)jO~vIi{j+  l)0~(l  + jFI  ijcrs<32(I(i  + 2)jO  + Ii(j  + 2)o'> 


(25) 
(3) 

(26) 


The  last  term  in  (26)  involves  higher  order  differentiation,  which  often  suffers  from 
quantization  error  due  to  limited  filter  support.  Furthermore,  it  is  very  small  in  smooth 
images.  We  choose  to  ignore  it  in  practice.  Hence,  we  have  proved  equation  (21). 
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